智能论文笔记

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Yuting Guo , Swati Rajwal , Sahithi Lakamana , Chia-Chun Chiang , Paul C. Menell , Adnan H. Shahid , Yi-Chieh Chen , Nikita Chhabra , Wan-Ju Chao , Chieh-Ju Chao

分类：自然语言处理

2022-12-23

Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.

translated by 谷歌翻译

Meta Sparse Principle Component Analysis

Imon Banerjee , Jean Honorio

分类： (统计)机器学习 | 机器学习

2022-08-18

我们研究了在高维主成分分析中恢复支持的元学习（即非零条目集）。我们通过从辅助任务中学到的信息来降低新任务中足够的样本复杂性。我们假设每个任务都是具有不同支持的不同随机主组件（PC）矩阵，并且PC矩阵的支持联合很小。然后，我们通过最大化$ l_1 $调查的预测协方差来汇总所有任务中的数据，以执行单个PC矩阵的不当估计，以确定具有很高的概率，只要有足够的任务$ M，就可以恢复真正的支持联盟$和足够数量的样本$ o \ left（\ frac {\ log（p）} {m} \ right）$对于每个任务，对于$ p $ - 维矢量。然后，对于一项新颖的任务，我们证明了$ l_1 $ regularized的预测协方差的最大化，并具有额外的约束，即支持是估计支持联盟的一个子集，可以将成功支持恢复的足够样本复杂性降低到$ o（ \ log | j |）$，其中$ j $是从辅助任务中恢复的支持联盟。通常，对于稀疏矩阵而言，$ | j | $将少于$ p $。最后，我们通过数值模拟证明了实验的有效性。

translated by 谷歌翻译

Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics

Xiaoyuan Guo , Jiali Duan , C. -C. Jay Kuo , Judy Wawira Gichoya , Imon Banerjee

分类：计算机视觉

2022-07-31

视觉语言预处理框架中的语言方式是天生离散的，在语言词汇中赋予每个单词是语义含义。相比之下，视觉方式本质上是连续和高维的，这可能禁止视觉和语言方式之间的对齐和融合。因此，我们建议通过联合学习一本赋予每个视觉令牌语义的代码手册来“离散”视觉表示。然后，我们利用这些离散的视觉语义作为自我监督的基础真相来构建我们的蒙版图像建模目标，这是蒙版语言建模的对应物，证明了语言模型成功。为了优化代码簿，我们扩展了VQ-VAE的配方，该配方提供了理论保证。实验验证了我们在常见视觉基准测试中的方法的有效性。

translated by 谷歌翻译

Advances in Prediction of Readmission Rates Using Long Term Short Term Memory Networks on Healthcare Insurance Data

Shuja Khalid , Francisco Matos , Ayman Abunimer , Joel Bartlett , Richard Duszak , Michal Horny , Judy Gichoya , Imon Banerjee , Hari Trivedi

分类：机器学习 | 人工智能

2022-06-30

30天的医院再入院是一个长期存在的医疗问题，会影响患者的发病率和死亡率，每年造成数十亿美元的损失。最近，已经创建了机器学习模型来预测特定疾病患者的住院再入院风险，但是不存在任何模型来预测所有患者的风险。我们开发了一个双向长期记忆（LSTM）网络，该网络能够使用随时可用的保险数据（住院访问，门诊就诊和药物处方）来预测任何入院患者的30天重新入选，无论其原因如何。使用历史，住院和入院后数据时，表现最佳模型的ROC AUC为0.763（0.011）。 LSTM模型显着优于基线随机森林分类器，表明了解事件的顺序对于模型预测很重要。与仅住院数据相比，与住院数据相比，将30天的历史数据纳入也显着改善了模型性能，这表明患者入院前的临床病史，包括门诊就诊和药房数据是重新入院的重要贡献者。我们的结果表明，机器学习模型能够使用结构化保险计费数据以合理的准确性来预测住院再入院的风险。由于可以从网站中提取计费数据或同等代理人，因此可以部署此类模型以识别有入院风险的患者，或者分配更多可靠的随访（更近的后续后续，家庭健康，邮寄药物） - 出院后风险患者。

translated by 谷歌翻译

MedShift: identifying shift data for medical dataset curation

Xiaoyuan Guo , Judy Wawira Gichoya , Hari Trivedi , Saptarshi Purkayastha , Imon Banerjee

分类：计算机视觉

2021-12-27

为了策划高质量的数据集，识别内部和外部来源之间的数据方差是一个基本和关键的步骤。但是，尚未显着研究检测数据移位或差异的方法。对此的挑战是缺乏学习DataSet的密集代表和在医疗机构分享私人数据的困难的有效方法。为了克服这些问题，我们提出了一个统一的管道，称为MedShift以检测顶级移位样本，从而促进医疗策序。给定内部数据集A作为基础源，我们首先为每类数据集A列车以以无人监督的方式学习内部分布。其次，在不交换跨源的情况下，我们在每个类的外部数据集b上运行训练的异常检测器。具有高异常分数的数据样本被识别为移位数据。为了量化外部数据集的换档，我们将B的数据基于所获得的分数群集分组。然后，我们通过逐渐删除每个类的最大异常分数来测量B的多级分类器并测量与分类器的性能方差的班次。此外，我们还调整数据集质量指标，以帮助检查多个医疗源的分布差异。我们验证了来自肌肉骨骼射线照片（Mura）和胸部X射线数据集的MedShift的疗效，来自多个外部源。实验表明我们所提出的移位数据检测管道对医疗中心有益，以更有效地策划高质量的数据集。一个接口介绍视频，可视化我们的结果可在https://youtu.be/v3bf0p1sxqe上获得。

translated by 谷歌翻译

RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR

Yuyin Zhou , Shih-Cheng Huang , Jason Alan Fries , Alaa Youssef , Timothy J. Amrhein , Marcello Chang , Imon Banerjee , Daniel Rubin , Lei Xing , Nigam Shah

分类：计算机视觉

2021-11-23

尽管辐射学家常规使用电子健康记录（EHR）数据来形成临床历史并通知图像解释，但医学成像的大多数深度学习架构是单向的，即，它们只能从像素级信息中学习特征。最近的研究揭示了如何从像素数据中恢复种族，仅突出显示模型中的严重偏差的可能性，这未能考虑人口统计数据和其他关键患者属性。然而，缺乏捕获临床背景的成像数据集，包括人口统计学和纵向病史，具有偏远的多式化医学成像。为了更好地评估这些挑战，我们呈现RadFusion，一种多式联运，基准数据集1794名患者的相应EHR数据和高分辨率计算断层扫描（CT）扫描标记为肺栓塞。我们评估了几个代表性的多模式融合模型，并在受保护的亚组中，例如性别，种族/种族，年龄的年龄。我们的研究结果表明，集成成像和EHR数据可以提高分类性能和鲁棒性，而不会在人口群之间的真正阳性率下引入大的差异。

translated by 谷歌翻译

Two-step adversarial debiasing with partial learning -- medical image case-studies

Ramon Correa , Jiwoong Jason Jeong , Bhavik Patel , Hari Trivedi , Judy W. Gichoya , Imon Banerjee

分类：计算机视觉 | 机器学习

2021-11-16

在过去几年中，在医疗保健中使用人工智能（AI）已成为一个非常活跃的研究领域。虽然在图像分类任务中取得了重大进展，但实际上只能部署一些AI方法。目前积极使用临床AI模型的主要障碍是这些模型的可信度。这些复杂模型更常见，是一种黑色盒子，其中产生了有希望的结果。然而，当仔细检查时，这些模型开始在决策期间揭示隐式偏差，例如检测种族并对民族群体和群体具有偏见。在我们正在进行的研究中，我们开发了一个两步的逆势脱叠方法，部分学习可以减少种族差异，同时保留目标任务的性能。该方法已经在两个独立的医学图像案例研究 - 胸X射线和乳房X光检查中进行了评估，并在保持目标性能的同时表现出偏差减少的承诺。

translated by 谷歌翻译

A System-Level View on Out-of-Distribution Data in Robotics

Rohan Sinha , Apoorva Sharma , Somrita Banerjee , Thomas Lew , Rachel Luo , Spencer M. Richards , Yixiao Sun , Edward Schmerling , Marco Pavone

分类：机器人 | 机器学习

2022-12-28

When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.

translated by 谷歌翻译

Scalable Hybrid Learning Techniques for Scientific Data Compression

Tania Banerjee , Jong Choi , Jaemoon Lee , Qian Gong , Jieyang Chen , Scott Klasky , Anand Rangarajan , Sanjay Ranka

分类：机器学习

2022-12-21

Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.

translated by 谷歌翻译

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task

Shailaja Keyur Sampat , Pratyay Banerjee , Yezhou Yang , Chitta Baral

分类：计算机视觉

2022-12-07

'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP (Sampat et. al., 2021) is one such testbed for hypothetical vision-language reasoning with actions as the key focus. In this work, we propose a novel learning strategy that can improve reasoning about the effects of actions. We implement an encoder-decoder architecture to learn the representation of actions as vectors. We combine the aforementioned encoder-decoder architecture with existing modality parsers and a scene graph question answering model to evaluate our proposed system on the CLEVR_HYP dataset. We conduct thorough experiments to demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.

translated by 谷歌翻译